home *** CD-ROM | disk | FTP | other *** search
- Path: hydra.zrz.TU-Berlin.DE!rawneiha
- From: rawneiha@hydra.zrz.TU-Berlin.DE (Philipp Boerker)
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: Texture/Gouraud innerloop speedtests
- Date: 19 Feb 1996 14:38:55 GMT
- Organization: Technical University Berlin, Germany
- Message-ID: <4ga21v$lsk@brachio.zrz.TU-Berlin.DE>
- References: <38232464@kone.fipnet.fi>
- NNTP-Posting-Host: hydra.zrz.tu-berlin.de
-
- "Jyrki Saarinen" <jsaarinen@kone.fipnet.fi> writes:
-
-
- >Ok, I did a little research. My CPU is a 40MHz 68040,
- >a Warp Engine with a very fast memory system, maybe
- >this is the reason I did not gain any speed even if
- >I turned the data cache and thus data burst off,
- >with data burst everything was about 50% slower.
-
- Not very surprising! Data burst means that whenever
- a cache-miss occurs the CPU loads 4 longwords around
- the mem area where the data to be fetched is. For a
- tmapping loop this means that for almost any pixel that
- is fetched from the texture the CPU keeps the bus busy
- for 4 mem cycles!
-
-
- >So the frame rates were for a 320x256 screen:
- >Texture/Gouraud/Shading table, 64k aligned: ~43 fps
- >Plain Texture, 64k aligned: ~67 fps
-
- fps? Are these figures for the mere repetition (320*256 times)
- of the innerloop?
-
- > move.b (a3),d1
- > move.l d1,a3
- > add.l a2,a1
- > move.b (a3),(a0)+
- > dbf d7,poly
- > rts
-
- >The places were schedeling was most effective were the
- > move.l d0,a3
- > <something here is a must>
- > move.b (a3),d1
- > <or>
- > move.b (a3),(a0)+
-
- > move.b (a3,d0.l),d1
- > move.b (a4,d1.l),(a0)+
- [...]
- > dbf d7,poly
- > rts
-
- If I understand your problem right you wonder why the
- two version are almost equal in terms of speed? The scheduling
- is not optimal in both versions, you use the data that you
- fetch in the next instruction.
-
- Greets,
- Phil.
-
- grond/matrix
-
-